-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: use -O2
to compile libgumbo and the nokogiri extension
#2639
Conversation
c30b63b
to
9e433fc
Compare
My benchmarks show this generates code that is: - 22% faster at HTML5 serialization - 93% faster at HTML5 parsing Note that `-O3` generates slightly slower HTML5 serialization code (see 8220dc7). Note also that this doesn't change the compiler options used for libxml2 and libxslt (which includes `-O2` already). Using the following benchmark script: #! /usr/bin/env ruby # coding: utf-8 require "bundler/inline" gemfile do source "https://rubygems.org" gem "nokogiri", path: "." gem "benchmark-ips" end require "nokogiri" require "benchmark/ips" input = File.read("test/files/tlm.html") puts "input #{input.length} bytes" html4_doc = Nokogiri::HTML4::Document.parse(input) html5_doc = Nokogiri::HTML5::Document.parse(input) puts RUBY_DESCRIPTION Benchmark.ips do |x| x.time = 10 x.report("html5 parse") do Nokogiri::HTML5::Document.parse(input) end x.report("html4 parse") do Nokogiri::HTML4::Document.parse(input) end x.compare! end Benchmark.ips do |x| x.time = 10 x.report("html5 serialize") do html5_doc.to_html end x.report("html4 serialize") do html4_doc.to_html end x.compare! end with default settings on my dev system (which are `-O3` for extension and unspecified for libgumbo): > input 70095 bytes > ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] > Warming up -------------------------------------- > html5 parse 12.000 i/100ms > html4 parse 31.000 i/100ms > Calculating ------------------------------------- > html5 parse 129.637 (±16.2%) i/s - 1.260k in 10.051475s > html4 parse 355.723 (±21.4%) i/s - 3.441k in 10.104502s > > Comparison: > html4 parse: 355.7 i/s > html5 parse: 129.6 i/s - 2.74x (± 0.00) slower > > Warming up -------------------------------------- > html5 serialize 85.000 i/100ms > html4 serialize 131.000 i/100ms > Calculating ------------------------------------- > html5 serialize 843.993 (± 2.4%) i/s - 8.500k in 10.076902s > html4 serialize 1.319k (± 2.9%) i/s - 13.231k in 10.039827s > > Comparison: > html4 serialize: 1319.0 i/s > html5 serialize: 844.0 i/s - 1.56x (± 0.00) slower after enabling `-O2` on both gumbo and nokogiri source files: > input 70095 bytes > ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] > Warming up -------------------------------------- > html5 parse 21.000 i/100ms > html4 parse 36.000 i/100ms > Calculating ------------------------------------- > html5 parse 250.245 (±20.8%) i/s - 2.394k in 10.066381s > html4 parse 371.905 (±20.2%) i/s - 3.600k in 10.025980s > > Comparison: > html4 parse: 371.9 i/s > html5 parse: 250.2 i/s - same-ish: difference falls within error > > Warming up -------------------------------------- > html5 serialize 101.000 i/100ms > html4 serialize 128.000 i/100ms > Calculating ------------------------------------- > html5 serialize 1.037k (± 3.3%) i/s - 10.403k in 10.042146s > html4 serialize 1.301k (± 4.2%) i/s - 13.056k in 10.055585s > > Comparison: > html4 serialize: 1300.8 i/s > html5 serialize: 1037.2 i/s - 1.25x (± 0.00) slower
9e433fc
to
ece08bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm a little surprised we weren't already using -O2
or -O3
.
@stevecheckoway Yeah, this was a rather obvious (in hindsight) missed opportunity. |
@flavorjones Can it be backported to 1.13.x ? for ruby 2.6.x ? |
@choosen No, sorry, I won't be backporting this change. Patch releases are intended for security and bug fixes, please see the section in the README on our interpretation of semantic versioning. |
What problem is this PR intended to solve?
My benchmarks show this generates code that is:
Note that
-O3
generates slightly slower HTML5 serializationcode (see 8220dc7).
Note also that this doesn't change the compiler options used for
libxml2 and libxslt (which includes
-O2
already).Using the following benchmark script:
with default settings on my dev system
(which are
-O3
for extension and unspecified for libgumbo):after enabling
-O2
on both gumbo and nokogiri source files: